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P-H ■ Abstract 

We consider vectors from {0, l} n . The weight of such a vector v is the 
sum of the coordinates of v. The distance ratio of a set L of vectors is 
dr(L) := max{p(x,y) : x,y £ L}/ min{p(x,y) : x,y £ L, x 7^ y}, where 
p(x, y) is the Hamming distance between x and y. We prove that (a) there 
are no positive constants a and C such that every set K of vectors with 
weight p contains a subset K' with \K'\ > \K\ a and &r(K') < C, even 
when \K\ > 2 P , (b) for a set K of vectors with weight p, and a constant 
q ' C > 2, there exists K' <Z K such that dr( J R") < C and A" > where 

a = l/riog(p/2)/log(C/2)l. 

^ ! 1 Introduction 

, We will consider n-dimensional binary vectors (i.e., vectors from {0,1}™) and 

\Q ' call them n-vectors. The (Hamming) weight \v\ of an n- vector v is the sum of 

the coordinates of v. The (Hamming) distance p(u,v) between n-vectors u, v is 
the number of coordinates where u and v differ. The distance ratio of a set L 
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fN| , of n-vectors is 

_ max{p(x,y) : x,y G L} 
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mm{p(x, y): x,y € L, x ^ y} ' 

Let p < n be positive integers. Abramovich and Grinshtein 1 asked whether 
the following claim holds true: 

Claim 1. There exist positive constants a and C such that every set K of at 
least TP n-vectors with Hamming weight p contains a subset K' with \K'\ > \K\ a 
and dr(K') < C. 

If the claim is true, it can be used in statistics for establishing the lower 
bounds for the minimax risk of estimation in various sparse settings [TJ [3] . If 
the claim is not true, a counterexample can be used to impose some conditions 
on K such that the claim becomes true and, thus, is still useful for establishing 
minimax lower bounds over narrower classes of settings. Also, a weaker bound 
on \K'\ can be used to obtain weaker lower bounds for the risk of estimation. 

The following example shows that for some sets K the claim is true. Let 
p < n/2 and let O denote the set of all n-vectors of weight p. By Lemma A. 3 
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in [5] (which is a generalization of the Varshamov- Gilbert lemma attributed to 
Reynaud-Bouret [2]), there exists a subset Q' of ft such that p(x, y) > (p + 1) /4 
for all distinct x,y e tt' and |0'| > (1 + enjpf'P for some > 9 • 10 -4 . It 
follows that dr(r2') < 8 (since p(u,v) < 2p for all u, v 6 O). Moreover, since 
\Q\ = (") < (en/p) p and \n'\ > (en/p)^, we have > For sufficiently 

large n/p, > 2*. 

Unfortunately, in general, the claim is not true and we give a counterex- 
ample to the claim in Section [2j In Section |3l we show that a weaker claim 
holds: there exists K' C K such that di{K) < C and K' > \K\ a , where 
a=l/[log(p/2)/log(C/2)l (C>2). 

Henceforth [s] := {1, . . . , s} for a positive integer s. 

2 Counterexample 

Let us fix constants C > 1 and < a < 1. We will show that there is no set K 
of n- vectors satisfying Claim [T] for these C and a. In this section, we will use 
fixed positive integers t,a,p,q and n satisfying the following: 

1. 1/t < a, a > C: 

2. p is a multiple of a'; 

3. q l > 2P; 

4. n^p + piq-^^liq/aY^. 

We say a set L of rt- vectors is a Co-set if L consists of a single vector. For 
i € [t] , a set L of vectors is a Ci -set if it satisfies the following: 

1. |Z| = <f; 

2. max{/9(x, y) : x,y E L} = 2p/a t ~ 1 ; 

3. i can be partitioned into g sets L±, . . . ,L q such that for each r, L r is a 
Ci_i-set, and for all x £ L. r , y <E L s with r ^ s, p(x, y) = 2p/a t ~ l . 

Lemma 1. For each i £ [t], there is a set K of n-vectors such that K is a 
d-set. 

Proof. For a set L of n-vectors to be a C^-set, we need that 

max{p(a;, y) : x, y € L} = 2p/a t ~ l . 

So for every pair x, y € L of distinct n-vectors, there must be a set X C [n] 
with \X\ > p — p/o} -1 , such that = y. L = 1 for all i £ I. In fact, in our 
construction below we will assure that in a C^-set, there exists X C [n] with 
| JSC I > p — p/a t ~ z such that xi — 1 for all x G L. 

For some S C T C [n], we say a set L of n-vectors is a Cj-set between {S, T) 
if L is a Ci-set, and for all x & L, x r — 1 if r e 5 and a; r = if r ^ T. We give 
a recursive method to construct a C^-set between (S, T) when |5| = p — p/a t ~ t 
and T| is large enough (we calculate the required size of T later). We can then 
construct the required set K by constructing a C t -set between (0, [n]). 
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Given S, T, construct a Ci-set L between (S, T) as follows. If i = 0, return 
a single n- vector x of Hamming weight p, such that x r = 1 for all r £ S 1 and 
2V = for all r ^ T. 

If £ > 1, partition T\S into q sets T x , . . . , T q , such that -1 < \T r \ - \T S \ < 1 
for all r, s. For each 1 < r < g, let 5 r be a subset of T r of size p/a t ~ l —p/a t ~^~ 1 \ 
Then for each r construct a Ci_i-set i r between (S U S r ,SU T r ), and let L be 
the union of these sets. (Note that |S'U5'. r | = p — p/a*~^ _1 \ as required for the 
recursion.) 

Observe that since \S\ = p — p/a 1 ^ 1 , m&x{p(x,y) : x,y £ L} < 2p/a l ~ l . 
Furthermore, since T\,...,T q are disjoint, for x £ L r ,y £ L s with r ^ s, 
p{x,y) = 2p/a t ~\ Finally note that \L\ = J^lZl \L r \ = m^ 1 = Q l - Therefore 
L satisfies all the conditions of a Ci-set between (S,T). 

We now calculate a bound fi such that we can construct a C^-set between 
(S,T) when |5| = p-p/a 1 ^ 1 as long as \T\ > fr. 

Clearly / = p. For i > 0, in the construction above we require that \SL>T r \ > 
fi-i for each 1 < r < q. Therefore we require 

fi = \S\ + - \S\) = qf t -t - (q - l^p-p/a 1 - 1 ). 

Observe that this is satisfied by setting fi = p + p(q — 1) J^j=i (9 1 J 7 at 

So to construct a Ci-set between (0, [n]), it suffices that n > p + p(q — 
1) ^ZfcK?/' 1 )' - "'! which holds by Part 4 of the conditions on t,a,p,q and n 
given in the beginning of this section. □ 

Theorem 1. There is a set K of n-vectors for which Claim{l\ does not hold. 

Proof. We will construct a set K such that for any subset of K with more 
than q = | ^FsT | 1 /* vectors, the distance ratio is at least a. This implies that for 
any subset with at least \K \ a vectors the distance ratio is greater than C, as 
required. 

By Lemma [TJ we may assume that we have a C^-set K. Thus, K can be 
partitioned into q sets Ki, . . . , K q such that for each r, K r is a Ci_i-set, and for 
all x € K r , y £ K s with r ^ s, p(x, y) = 2p/a t ~ l . 

Note that any subset K' C K of more than q vectors will contain at least two 
vectors from K r for some r and so min{p(a;, y) : x, y £ K' , x ^ y} < 2p/a t ~ l+1 ; 
furthermore if K' contains vectors from K r and K s for r =^= s then max{p(a;, y) : 
x,y £ K'} > 2p/a t ~\ 

Therefore, for any K' C K with \K'\ > q, either dr (!£"') > a, or K' C ^ 
for some C,_i-set Furthermore there is no K' C K with > g if K is a 
Ci-set. So by induction on £ > 1, every K' C K with > q has di^if') > a. 
By letting i = t, we complete the proof of the theorem. □ 

3 Positive Result 

Given a set K of n-vectors, we are interested in finding a subset K' C K &s 
large as possible such that di(K') < C, for some constant C . The following is 
such a result. 

Theorem 2. Let K be a set of n-vectors with Hamming weight exactly p, and 
let C > 2 be a constant. Then there exists K' C K such that dr(i^) < C and 
K' > \K\ a , whereas 1/ flog(p/2)/ log(C/2)] . 
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Proof. Let t = flog(p/2)/log(C/2)l = 1/a. 

Let K\ = K . For each 1 < i < t, let -fQ+i be a maximal subset of Ki such 
that mm{p(x,y)\x,y € -Ki+i, a; ^ ?/} > C l /2 l ~ 1 . For each vector z <E Ki, let 
Ni(z) be the set of vectors x £ Ki for which pfx, z) < C 1, /2 i . 

Observe that max{p(x, y G Ni(z)} < C l /2 l ~ 2 . Therefore since mm{p(x, y)\x, y G 
Ki} > C 1 ^ 1 /2 l ~ 2 , we have dv(Ni(z)) < C. Note furthermore that by the max- 
imality of Ki+i, every vector in Ki is in Ni(x) for some x G Therefore, 
for 1 < i < t, we either have that |iVj(a;)| > \K\ a for some x G -fQ+i, in which 
case we are done, or |if| a |i<Ti + i| > \Ki\. By induction, we have that \Ki\ > 
\K \/\K\ a ^ l ~ l > for 1 < i < t (or else we can find a set Ni(x) satisfying the theo- 
rem). In particular, we have that \K t \ > \K\/\K\ a( - t - 1 '> = \K\/\ K^ 01 = \K\ a . 

Now observe that max{p(a;, y)\x, y G K t } < 2p. Furthermore, 

mm{p(x,y)\x,y G K t , x ± y} > C^ 1 f^ 2 = (4/C)(C/2) 4 > (4/C)(p/2) = 2p/C. 
Therefore dr(K t ) < C. This completes the proof. □ 
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